Systems and methods for structuring data from unstructured electronic data files

ABSTRACT

Computer implemented systems and methods are disclosed for structuring data from unstructured electronic data files. In accordance with some embodiments, an electronic data file including unstructured content associated with a legal process return is received and the unstructured content parsed. The unstructured content is parsed to identify one or more objects and properties based on a database ontology that are processed to generate an object model. A data report may be generated based on the identified objects and properties.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.14/923,712, filed Oct. 27, 2015, entitled “SYSTEMS AND METHODS FORSTRUCTURING DATA FROM UNSTRUCTURED ELECTRONIC DATA FILES,” which claimsthe benefit of U.S. Provisional Patent Application No. 62/214,856, filedSep. 4, 2015, entitled “SYSTEMS AND METHODS FOR STRUCTURING DATA FROMUNSTRUCTURED ELECTRONIC DATA FILES,” which are incorporated herein intheir entireties.

BACKGROUND

Law enforcement agencies increasingly rely on social media data toperform criminal investigations. An agency typically serves a searchwarrant, subpoena, or another type of legal process on a social mediaplatform administrator which provides a legal process return to theagency in response to the legal process. Legal process returns may beprovided as electronic data files in a number of formats including, forexample, PDF files, text files, spreadsheets, and database files. Theycan include information such as, for example, contact information,friend lists, private messages, public posts, “tag” and “like” or“favourite” history, phone numbers, login history, and IP addressinformation.

Problems arise when a legal process return is received as an electronicdata file that includes unstructured data. The unstructured data, forexample, may need to be manually processed by law enforcement agenciesin order to aggregate the data and produce useful reports. Such manualprocessing may require significant amounts of time to accomplish (e.g.,weeks or months) and can reduce the value of the acquired information,as the information may become stale or irrelevant during that time.Moreover, the size of unstructured electronic data files can make itdifficult or impossible to view the files using native files viewers.For example, legal process returns that include unstructured data caninclude several hundreds of thousands of pages of data. These electronicdata files may exceed sizes of 500 Mb, making it impossible for agenciesto view and search the files on conventional data management systems.

BRIEF DESCRIPTION OF THE DRAWINGS

Reference will now be made to the accompanying drawings, whichillustrate exemplary embodiments of the present disclosure and in which:

FIG. 1 is a block diagram of an exemplary system for structuring datafrom unstructured electronic data files, consistent with embodiments ofthe present disclosure.

FIG. 2 is a block diagram of an exemplary data structuring system forstructuring data from unstructured electronic data files, consistentwith embodiments of the present disclosure.

FIG. 3 illustrates an example object model, consistent with embodimentsof the present disclosure.

FIG. 4 illustrates and example implementation of an interactive GUI,consistent with embodiments of the present disclosure.

FIGS. 5-7B illustrate embodiments of example data reports generated bythe exemplary data structuring system of FIG. 2, consistent withembodiments of the present disclosure.

FIG. 8 is a flow diagram depicting an example method for structuringdata from unstructured electronic data files, consistent withembodiments of the present disclosure.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

Reference will now be made in detail to exemplary embodiments, theexamples of which are illustrated in the accompanying drawings. Wheneverpossible, the same reference numbers will be used throughout thedrawings to refer to the same or like parts.

The disclosed embodiments describe improved methods and systems forstructuring data from unstructured electronic data files. The improveddata structuring systems and methods can receive electronic data filesincluding unstructured social media content in excess of 500 Mb in size,parse the unstructured content, structure the parsed content byassigning object types and property types to the parsed content, andstored the structured content in a database. The disclosed datastructuring systems and methods may aggregate the structured content togenerate various types of data reports. The reports may include, forexample, reconstructed conversations between a subject and theircontacts, a list of normalized phone numbers associated with thesubject, a geographic mapping of IP addresses associated with thesubject, a list of IP addresses shared between the subject and otherpersons, a timeline of specific events (logins, subject movement, etc.),and other reports. The data structuring systems and methods may alsopresent the aggregated structured content in an interactive graphicaluser interface that allows for free-form customization and explorationof the aggregated structured content.

Accordingly, the systems and methods described herein are capable offiltering large amounts of data in a quick, logical, and visuallyassociative way. More specifically, the systems and methods can, amongother things, provide the ability to display information about eventsand entities both temporally and geographically, and allow for theselection and grouping of different entities and events on the graphicalrepresentation. Furthermore, the disclosed systems and methods arecapable of resolving multiple instances of object and propertyreferences across enterprise databases into a canonical format based ona database ontology.

FIG. 1 is a block diagram of an exemplary system environment 100 forstructuring data from unstructured electronic data files, consistentwith embodiments of the present disclosure. As shown in FIG. 1, systemenvironment 100 includes a number of components. It will be appreciatedfrom this disclosure, however, that the number and arrangement of thesecomponents is exemplary only and provided for purposes of illustration.Other arrangements and numbers of components may be utilized withoutdeparting from the teachings and embodiments of the present disclosure.

As shown in the example embodiment of FIG. 1, system environment 100 mayinclude one or more social media platforms 110, 120. Social mediaplatform 110, 120 may include platforms such as, for example, Facebook,Twitter, Instagram, SureSpot, Kik, PalTalk, or any other social mediaplatform known in the art. Social media platform 110, 120 may beimplemented by, for example, a server, a server system comprising aplurality of servers, a server farm comprising a load balancing systemand a plurality of servers, a mainframe computer, or any combination ofthese components. In certain embodiments, social media platform 110, 120may be a standalone computing system or apparatus, or it may be part ofa subsystem, which may be part of a larger system. For example, socialmedia platform 110, 120 may represent distributed servers that areremotely located and communicate over a communications medium (e.g.,network 150) or over a dedicated network, for example, a LAN. In someembodiments, social media platform 110, 120 may be implemented withhardware devices and/or software applications running thereon. In someembodiments, social media platform 110, 120 may be configured tocommunicate to and/or through network 150 with other components such asdata structuring system 130 and database 140, and vice-versa. Also, insome embodiments, social media platform 110, 120 may implement aspectsof the present disclosure without the need for accessing another device,component, or network, such as network 150.

Network 150 may include any combination of communications networks. Forexample, network 150 may include the Internet and/or any type of widearea network, an intranet, a metropolitan area network, a local areanetwork (LAN), a wireless network, a cellular communications network,etc. In some embodiments, client 110, 120 may be configured to transmitdata and information through network 150 to an appropriate dataimporter, such as, for example, data importer 130. For example, client110, 120 may be configured to transmit electronic data files includingvarious types of content to data importer 130. In some aspects, client110, 120 may also be configured to receive information from dataimporter 130 through network 150.

Data structuring system 130 may be configured to communicate andinteract with social media platform 110, 120, and database 140. Incertain embodiments, data structuring system 130 may be standalonesystem or apparatus, or it may be part of a subsystem, which may be partof a larger system. For example, data structuring system 130 mayrepresent a distributed system that includes remotely located sub-systemcomponents that communicate over a communications medium (e.g., network150) or over a dedicated network, for example, a LAN.

In some embodiments, data structuring system 130 may be configured toreceive data and information through network 150 from various devicesand systems, such as, for example, social media platform 110, 120. Forexample, data structuring system 130 may be configured to receive legalprocess returns in the form of electronic data files from social mediaplatform 110, 120, and other devices and systems. The electronic datafiles may be received in various file formats and may include contentthat is provided by social media platform 110, 120 in response to alegal process such as warrant, national security letter, subpoena, etc.,relating to a criminal investigation conducted by a law enforcementagency. The content may include social media content associated with asubject of the criminal investigation such as, for example, contactinformation, friend lists, private messages, phone numbers, logininformation, IP address information, photos, photo albums, profiles ofpersons associated with the subject, email addresses, public socialmedia posts (e.g., wall posts, microblog posts such as Tweets, andstatus updates), location updates (e.g., check-ins and public postsregarding the subject's location), etc. Data structuring system 130 maybe configured to structure and import the content included in thereceived electronic data files into one or more structured databasessuch as, for example, database 140.

Database 140 may include one or more logically and/or physicallyseparate databases configured to store data. The data stored in database140 may be received from data structuring system 130, from social mediaplatform 110, 120 and/or may be provided as input using conventionalmethods (e.g., data entry, data transfer, data uploading, etc.). Thedata stored in the database 140 may take or represent various formsincluding, but not limited to, electronic data files, object mappings,property mappings, report templates, user profile information, and avariety of other electronic data or any combination thereof. In someembodiments, database 140 may include separate databases that storeelectronic data files, object and property mappings, and reporttemplates, respectively. In still some other embodiments, the databasesthat store electronic data files, object and property mappings, andreport templates can be combined into various combinations. In stillsome other embodiments, database 140 includes a single database thatstores electronic data files, object and property mappings, and reporttemplates.

In some embodiments, database 140 may be implemented using any suitableform of a computer-readable storage medium. In some embodiments,database 140 may be maintained in a network attached storage device, ina storage area network, or combinations thereof, etc. Furthermore,database 140 may be maintained and queried using numerous types ofdatabase software and programming languages, for example, SQL, MySQL,IBM DB2®, Microsoft Access®, PERL, C/C++, Java®, etc. Although FIG. 1shows database 140 associated with data structuring system 130, database140 may be a standalone database that is accessible via network 150,database 140 may be included in data structuring system 130, or database140 may be associated with or provided as part of a system orenvironment that may be accessible to social media platform 110, 120and/or other components.

FIG. 2 is a block diagram of an exemplary data structuring system 130for implementing embodiments and aspects of the present disclosure. Forexample, data structuring system 130 may be used for structuring datafrom unstructured electronic data files. The arrangement and number ofcomponents included in data structuring system 130 is provided forpurposes of illustration. Additional arrangements, number of components,and other modifications may be made, consistent with the presentdisclosure.

As shown in FIG. 2, data structuring system 130 may include one or morecommunications interfaces 210. Communications interface 210 may allowdata and/or information to be transferred between data structuringsystem 130 and network 150, social media platform 110, 120, database140, and/or other components. For example, communications interface 210may be configured to receive legal process returns in the form ofelectronic data files that include unstructured content. Somenon-limiting examples of electronic data files include word processingfiles (.pdf, .doc, .docx, .txt, .log, .rtf, etc.), spreadsheets (.xls,.xlsx, .ods, etc.), comma separated values (CSV) files, presentations,archived and compressed files (e.g., ZIP files, 7z files, cab files, RARfiles, etc.), database files. PDF files, PUB files, image files, XMLfiles, specialized tax and financial files (e.g., Open FinancialExchange and Interactive Financial Exchange files), tabulated data filesand webpage files (e.g., HTML files). The received electronic data filesmay include various types of unstructured content. For example, thereceived electronic data files may include social media data associatedwith a subject of a criminal investigation as described above inreference to FIG. 1.

Examples of communications interface 210 may include a modem, a wired orwireless communications interface (e.g., an Ethernet, Wi-Fi, Bluetooth,Near Field Communication, WiMAX, WAN, LAN, etc.), a communications port(e.g., USB, IEEE 1394, DisplayPort, DVI, HDMI, VGA, Serial port, etc.),a PCMCIA slot and card, etc. Communications interface 210 may receivedata and information in the form of signals, which may be electronic,electromagnetic, optical, or other signals capable of being received bycommunications interface 210. These signals may be provided tocommunications interface 210 via a communications path (not shown),which may be implemented using wireless, wire, cable, fiber optics,radio frequency (“RF”) link, and/or other communications channels.

Data structuring system 130 may also include one or more file databases220. File database 220 may be configured to store electronic data filesreceived by data structuring system 130 at communications interface 210.

Data structuring system 130 may also include one or more structuringcomponents 230 that may parse the unstructured social media contentincluded the electronic data files stored in file database 220 andstructure the parsed data according to a database ontology 240.Exemplary embodiments for defining an ontology (such as databaseontology 240) are described in U.S. Pat. No. 7,962,495 (the '495Patent), issued Jun. 14, 2011, the entire contents of which areexpressly incorporated herein by reference. Among other things, the '495patent describes embodiments that define a dynamic ontology for use increating data in a database. For creating a database ontology 240, forexample, one or more object types may be created where each object typecan include one or more properties. The attributes of object types orproperty types of the database ontology 240 can be edited or modified atany time.

In some embodiments, object types may be further divided into a numberof sub-categories. For example, object types may be divided into entitytypes, event types and document types. Entity types may define a person,place, thing, or idea. Examples, of entity types include social mediaplatform profile (e.g., Facebook™, or Twitter™ user profile), IPaddress, email address, photo album, friend's list, and location. Eventtypes may define a type of social media platform event associated withthe subject of a criminal investigation. Event types may include, forexample, the subject logging into their social media platform profile,posting a photo to the subject's social media platform profile, sendingfriend requests, and accepting friend requests. Document types maydefine a type of social media platform document created by the subjector the subject's contacts. Examples of document types include privatemessages, status updates, microblog posts (e.g., Facebook™ wall postsTwitter™ Tweets), comments on other users' microblog posts, pictures,and videos.

In some embodiments, each property type is declared to be representativeof one or more object types. A property type is representative of anobject type when the property type is intuitively associated with theobject type. For example, a property type of “Text/Description” may berepresentative of an object type “Private Message” but notrepresentative of an object type “Photo Album.” In some embodiments,each property type has one or more components and a base type. In someembodiments, a property type may comprise a string, a date, a number, ora composite type consisting of two or more string, date, or numberelements. Thus, property types are extensible and can represent complexdata structures. Further, a parser definition can reference a componentof a complex property type as a unit or token.

An example of a property having multiple components is a Name propertyhaving a Last Name component and a First Name component. An example ofraw input data is “Smith, Jane.” An example parser definition specifiesan association of imported input data to object property components asfollows: {LAST_NAME}, {FIRST_NAME}→Name:Last, Name:First. In someembodiments, the association {LAST_NAME}, {FIRST_NAME} is defined in aparser definition using regular expression symbology. The association{LAST_NAME}, {FIRST_NAME} indicates that a last name string followed bya first name string comprises valid input data for a property of typeName. In contrast, input data of “Smith Jane” would not be valid for thespecified parser definition, but a user could create a second parserdefinition that does match input data of “Smith Jane.” The definitionName:Last, Name:First specifies that matching input data values map tocomponents named “Last” and “First” of the Name property. As a result,parsing the unstructured data in an electronic data file using theparser definition results in assigning the value “Smith” to theName:Last component of the Name property, and the value “Jane” to theName:First component of the Name property.

In some embodiments, object types and property types may be specific toeach social media platform. For example, database ontology 240 mayinclude sets of object types and property types that are specific toFacebook™, Twitter™, Instagram™, etc. In order to determine which set ofobject/property types to use for an electronic data file, structuringcomponent 230 may scan a header included in the electronic data file todetect a social media platform identifier. For example, the header mayinclude the name Facebook™ and the warrant or subpoena number.Structuring component 230 may detect the name Facebook™ in the file andselect the set of Facebook™ object/property types in response.

In some embodiments, parser 232 may parse the unstructured contentincluded in electronic data files stored in files database 220 toidentify one or more objects based on the set of object/property typesselected by structuring component 230. In order to parse theunstructured content, parser 232 may scan the unstructured content usingnatural language processing techniques to identify one or more words orstrings of words. In some embodiments, where the electronic data filesincludes text that is unrecognizable by parser 232 (e.g., where the fileincludes PDF images of text), structuring component 230 may extract thetext using techniques such as, for example, optical characterrecognition, optical word recognition, intelligent characterrecognition, and intelligent word recognition. Parser 232 may comparethe identified words or strings of words to the selected set of objecttypes defined in database ontology 240 to identify object types includedin the electronic data file. Once an object type has been identified,parser 232 may identify objects included in the electronic data file ofthat object type. As an example, parser 232 may identify the string“Registered Email Address” and compare the string to object typesdefined in database ontology 240. If the string matches a known objecttype, parser 232 may identify the next string of text as the subject'semail address (e.g., johndoe@email.com). A mapper 234 may assign objecttypes and property types to the identified objects. The objects,assigned object types, and assigned property types make up a structuredobject model of the electronic data file. Each object model maycorrespond to a legal process return received in response to a legalprocess for social media platform content associated with a subject. Thesubject may be, for example, a subject of a criminal investigationconducted by a law enforcement agency. Object models may be stored in anobject model database 250 and are described in more detail below inreference to FIG. 3.

In some embodiments, an object explorer 260 may generate an interactivegraphical user interface (GUI) that allows for the customization andexploration of the structured objects and properties. For example, theinteractive GUI may include various content filters that aggregate thestructured objects and properties based on various filter properties.The content filters may, for example, filter objects based on entitytype (e.g., IP address, email address, friend's list, etc.), event type(e.g., login events, phot post events, etc.), and document types (e.g.,private message, social media profile status update, wall posts, etc.).The content filters may also filter properties based on, for example,property types (e.g., warrant number, online identifier, date range,location, etc.).

Once the structured objects and/or properties have been filtered basedon one or more content filters, the interactive GUI may allow forcustomized data visualizations of the filtered data to be displayed. Forexample, a timeline of login events may be presented in the interactiveGUI when the structured objects are filtered by a login event type. Thetimeline may display when the login events occurred. When unstructuredcontent associated with multiple subjects have been structured andaggregated, the timeline presentation on the interactive GUI can displayhow many login events occurred at a given time and which subject loggedin at a particular time so that conclusions about real-worldinteractions between the subjects can be deduced or inferred. In someembodiments, the customized data visualizations of the filtered data canbe further customized, or a subset of the visualized data can beselected so that another customized data visualization can be displayed.For example, based on the login timeline example above, a subset of thevisualized login data can be selected, geocoded (using a MaxMinddatabase, for example), and used to generate a customized datavisualization of a map showing the geographic locations associated witheach selected login event. Accordingly, the interactive GUI allows forfree-form interaction and customization of the structured objects andproperties to generate useful visualizations of the structured objectsand properties so that various conclusions and extrapolations can beperformed.

As another example of the above interactive GUI, structured photographobjects may be filtered by a MD5 hash property type so that photographobjects stored in object model database 250 with same or similar MD5hashes can be aggregated and their properties analysed. For example, aphotograph with an MD5 hash may have been posted on a social mediaprofile of a subject. The interactive GUI can filter structuredphotograph objects based on the MD5 hash of the posted photograph toidentify other social media profiles associated with subjects that havealso posted the same photograph, therefore allowing conclusions andinferences of interactions between subjects who have posted the samephotograph to be drawn.

Object explorer may also generate various types of data reports based onthe object models stored in object model database 250. The data reportsmay include data models of objects and properties defined in an objectmodel such as, for example, timelines and geographic mappings of events,histograms of objects and properties, reconstruction of social mediaconversations (e.g., private message conversations between two or moreusers), mappings of shared IP addresses between two or more users,picture matching, friends list graphs, and other types of data models.

In order to generate a data report, object explorer may provideinstructions to a GUI generator 290 to generate a GUI of object explorer260. In response to the received instructions, GUI generator 290 maygenerate an interactive GUI for display on a display 295. Datastructuring system 130 may also include one or more input/output (I/O)devices 270 (e.g., physical keyboards, virtual touch-screen keyboards,mice, joysticks, styluses, etc.) that are configured to receive userinstructions in the form of user input. The received instructions mayinclude instructions to generate data reports based on objected modelsstored in object model database 250. Object explorer 260 may receive theuser input from I/O 270, generate the request data report based on areport template associated with the requested data report, and mayprovide instructions to GUI generator 290 for generating a display ofthe generated data report on display 295.

In some embodiments, object explorer 260 may include a template selector262 that selects a report template among the report templates stored ina report template database 280. The template selection may be selectedbased on user input received from I/O 270. For example, the user inputreceived at object explorer 260 may identify a data report typerequested by the user, and template selector 262 may retrieve the reporttemplate corresponding to the requested data report type. As an example,if the user requests a data report of all the telephone numbers includedin an object model, template selector 262 may select a telephone numberhistogram report template from report template database 280. As anotherexample, if the user requests a data report including a geographicmapping of a subject's social media platform login activity between10:30 p.m., Jul. 15, 2013 and 3:15 a.m., Jul. 16, 2013, templateselector 262 may select the appropriate template from report templatedatabase 280.

Once template selector 262 has selected the appropriate report templatefor the requested data report, a template applicator 264 may obtainobjects and properties included in the object model that are required bythe report template. Template applicator 264 may generate the requestedreport using the obtained objects and properties based on the selectedreport template. Template applicator 264 may provide instructions forGUI generator 290 to display the generated data report on display 295.

Structuring component 230, object explorer 260, and GUI generator 290may be implemented as hardware modules configured to execute thefunctions described herein. Alternatively, one or more processorssuitable for the execution of instructions may be configured to executethe functions of structuring component 230, object explorer 260, and GUIgenerator 290. For example, suitable processors include both general andspecial purpose microprocessors, programmable logic devices, fieldprogrammable gate arrays, specialized circuits, and any one or moreprocessors of any kind of digital computer that may be communicativelycoupled to a physical memory (not shown) storing structuring component230, object explorer 260, and GUI generator 290 in the form ofinstructions executable by the processor. Suitable memories may include,for example, NOR or NAND flash memory devices, Read Only Memory (ROM)devices, Random Access Memory (RAM) devices, storage mediums such as,for example, hard drives, solid state drives, tape drives, RAID arrays,etc. As another example, the functions of structuring component 230,object explorer 260, and GUI generator 290 may be included in theprocessor itself such that the processor is configured to implementthese functions.

File database 220, database ontology 240, object model database 250, andreport template database 280 may be implemented by database 140 ofFIG. 1. In some embodiments, one or more of databases 220, 240, 250, and280 may be included in the same database. In some embodiments, one ormore of databases 220, 240, 250, and 280 may be included in separatedatabases.

Display 295 may be implemented using devices or technology, such as acathode ray tube (CRT) display, a liquid crystal display (LCD), a plasmadisplay, a light emitting diode (LED) display, a touch screen typedisplay such as capacitive or resistive touchscreens, and/or any othertype of display known in the art.

FIG. 3 is illustrative of an exemplary object model 300 and acorresponding ontology (e.g., database ontology 240 in FIG. 2). Theelements of exemplary object model 300 can be stored in an object modeldatabase (e.g., object model database 250 of FIG. 2).

Object model 300 can include, among other things, entities 310A-C,events 320A, and documents 330A-C. Each entity 310, event 320, anddocument 330 can further contain properties including, withoutlimitation, representative properties, base properties, or complexproperties (e.g., transcript properties 340A-B) made up of multiple subproperties or components. Complex properties can be used to providedetailed information about entities, events, and documents.

As illustrated in FIG. 3, entity 310A may correspond to a social mediaplatform profile associated with the subject of a criminal investigationand entities 310B and 310C may correspond to social media platformprofiles associated with persons with whom the subject of a criminalinvestigation has interacted. For example, the subject may haveinteracted with the associated persons via private message documents330A and 330B.

Private message documents 330A and 330B may include various propertiessuch as, for example, a transcript property, an IP address property,“TO” and “FROM” properties, and a “date/time” property. The transcriptproperty, such as transcript property 350A, may contain the text ofprivate message documents (e.g., private message document 330A) as wellas additional properties. The additional properties may include, forexample, the name of the transcript, the character count, read receiptinformation, telephone numbers included in the message, and/or anyattachments in the message. For example, transcript property 350A mayinclude telephone number property 350E, which may be assigned as aproperty of private message document 330A. In some embodiments, thetranscript property could be in an audio format or some other formatinstead of written. It is appreciated that many different formats can becommonly used and would be known to one of ordinary skill in the artthat could replace a written or audio property.

Additionally, events, documents, and entities can contain notes andmedia. Notes can provide a container for textual information related tothe event, document, or entity. Media can represent binary dataassociated with the events, documents, or entities. Media data can takethe form of, for example, text documents, images, videos, or specializedformats.

Moreover, both objects and properties can contain geospatial andtemporal metadata. Geospatial metadata can provide a physical locationassociated with an object or property. For example, private messagedocument 330A can have an IP address property 350B which can be used toobtain the geographic location of the subject associated with socialmedia profile entity 330A that sent the private message. As anotherexample, login event 320A can have an IP address property 350Cassociated with the person associated with social media profile entity310A logging into a social media platform. It is appreciated that thegeospatial data can also be in any form that represents a location andis understood by the users of object model 300. Temporal metadata canrepresent either a specific point in time or a duration having a starttime and an end time. For example, private message document 310A cancontain a “TIME” property 350D indicating a specific date and time whenthe message was sent. In some embodiments duration can be indicated byincluding a start property and end property allowing calculation of theduration. The temporal data can be in any form (e.g., epoch time, UTCtime, or local time) that represents the time of the event or theduration of the event. Moreover, in some embodiments, geospatial andtemporal metadata can be correlated. For example, the geospatial andtemporal metadata can correspond to one or more locations and times whena person visited those one or more locations.

Entities 310, events 320, and documents 330 can serve as linksindicating relationships between the various objects. For example,private message document 330A can contain “FROM” and “TO” properties.The “FROM” property links social media profile 310A to private messagedocument 330A and the “TO” property links social media profile 310B toprivate message document 330A. Thus private message document 330A, whilestill containing its own relevant properties (e.g., temporal properties,geospatial properties, and transcript property 350A), can act as acomplex link between social media profiles 310A and 320B.

FIG. 4 illustrates and example implementation of an interactive GUI 400for free-form exploration of structured objects and properties. In someembodiments, example interactive GUI 400 may be generated by a datastructuring system (e.g., data structuring system 130 including anobject explorer 260, both of FIG. 2). GUI 400 may include a set ofcontent filters such as, for example, object types 410 and propertytypes 420. The object types 410 filter may further be divided intosub-filters such as, for example, entity types 412, event types 414, anddocument types 416. Content filters 410-416 and 420 are exemplary onlyand other filters may also be included in GUI 400. Content filters410-416 and 420 allow for the aggregation of structured objects andproperties so that customized data visualizations may be generated.

In some embodiments, GUI 400 may allow for customized datavisualizations of data filtered by content filters 410-416 and 420 to bedisplayed. GUI 400 may include various visualization types 430 that canbe used to generate displays of the filtered data. In the exampleillustrated in FIG. 4, a timeline visualization type, a pie chartvisualization type, a histogram visualization type, and a bar chartvisualization type are included in GUI 400. Other visualization types430 and combinations of visualization types 430 may be included in GUI400. In some embodiments, the visualization types 430 presented on GUI400 may depend on the type of content filter selected. For example, if alogin event type 416 filter is selected, GUI 400 may display a timelinevisualization type (that displays the login events on a timeline), ahistogram visualization type (that displays the number of login eventsassociated with various IP addresses), and a pie chart visualizationtype.

A customized data visualization may be generated using varioustechniques. For example, input may be received (from I/O 270 of FIG. 2,for example) in the form of a selection of an object type 410 or aproperty type 420 and a visualization type 430. The input may bereceived in various forms. For example, the input may be a userselecting an object type 410 or a property type 420 and dragging it ontop of a visualization type 430. As another example, the input may be auser highlighting an object type 410 or a property type 420 (by clickingon it, for example) and highlighting a visualization type 430.

In some embodiments, the customized data visualizations displayed on GUI400 can be further customized, or a subset of the visualized data can beselected so that another customized data visualization can be displayed.

FIGS. 5-7B illustrate example implementations of data reports. In someembodiments, the example data reports may be generated by a datastructuring system (e.g., data structuring system 130 including anobject explorer 260, both of FIG. 2). FIG. 5 in particular illustratesan example implementation of a telephone number histogram data report500. As shown in FIG. 5, data report 500 may include a list of telephonenumbers 510A-D. Telephone numbers 510A-D may have been included in oneor more private messages (e.g., private message document 330A of FIG. 3)between a subject of a criminal investigation and another person (e.g.,John Doe, entity 310A, and Jane Smith, entity 310B, both of FIG. 3). Aparser (e.g., parser 232 of FIG. 2) may have parsed the private messagesto identify and normalize telephone numbers 510A-D to a telephone numberformat required by a database ontology (e.g., database ontology 240 ofFIG. 2). As shown in data report 500, the data structuring system mayrepresent the number of times a telephone number 510A-D has shown up ina private message between the subject and another person as a histogram.The histogram may include data bars 520A-D that graphically representthe number of times each telephone number 510A-D has shown up in aprivate message. The histogram may also include a numeric representationof the number of times each telephone number 510A-D has shown up in aprivate message proximate to data bars 520A-D. In some embodiments, andas shown in FIG. 5, telephone numbers 510A-D (and data bars 520A-D byextension) may be ordered such the telephone number included in the mostprivate messages between the subject and another person is listed first(e.g., telephone number 510A).

In some embodiments, a user may interact with telephone numbers 510A-Dvia an I/O (e.g., I/O 270 of FIG. 2). The data structuring system maydisplay a list of the private messages that included the telephonenumber 510A-D in response to the user's interaction.

FIG. 6 illustrates an example implementation of a conversationreconstruction data report 600. As shown in FIG. 6, data report 600 mayinclude a list of private messages 610A-D. Private messages 610A-D mayhave been sent between a subject of a criminal investigation and anotherperson (e.g., John Doe, entity 310A, and Jane Smith, entity 310B, bothof FIG. 3). The data structuring system may generate display data report600 by, for example, identifying private messages included in one ormore object models (e.g., object model 300 of FIG. 3) stored in anobject model database (e.g., object model database 250 of FIG. 2). Theprivate messages may be identified based on the private messages withcombinations of “TO” and “FROM” properties that include John Doe andJane Smith.

Data report 600 allows users to interact with private messages 610A-D.For example, a user may select a private message 610A-D via an I/O. Inthe example illustrated in FIG. 6, private message 610A has beenselected by the user. In response, the data structuring system maygenerate a detailed display 620 of selected private message 610A. Forexample, detailed display 620 may include the entire content of selectedprivate message 610A, the “TO” and “FROM” properties of private message610A, and the “DATE” and “TIME” properties of private message 610A.

FIGS. 7A and 7B illustrate an example implementation of a logininformation data report 600 and a mapped login information data report730, respectively. As shown in FIG. 7A, data report 700 may include atimeline 710 across which login data 720 are distributed. Login data 720may correspond to login events such as, for example, a subject of acriminal investigation logging into a social media platform. Each bar oflogin data 720 may represent the number of login events that occurred atcertain points in time along timeline 710. Each bar of login data 720may span a specified time duration. For example, each bar of login data720 may cover a one-hour time interval, a 30-minute time interval, orany other time interval.

In some embodiments, data report 700 may be an interactive data report.For example, the data structuring system may be configured to receiveinput from a user corresponding to a selection of a subset of login data620. The user may highlight a time interval of login data 720 alongtimeline 710. As shown in the example illustrated in FIG. 7A, a subset730 has been selected.

A data report illustrating the subset 730 of login data 720geographically mapped may be displayed in response to the datastructuring system receiving the user's selection of subset 730. Forexample, mapped login information data report 740 illustrated in FIG. 7Bmay include the subset 730 of login data 720 superimposed over a map750. A scale adjuster 770 may be used to zoom map 750 in and/or out sothat more granularity can be obtained or more of subset 730 can bedisplayed at one time.

Data report 740 may illustrate the subject's locations 760 at the timeof each login event included in the subset 730 of login data 720. Inother words, locations 760 correspond to the subject's geographiclocation at the time the subject logged into the social media platform.In order to superimpose the subset 730 of login data 720 over map 750,the IP address properties associated with each login event may be tracedby the data structuring system to obtain a set of geographic coordinatesor other location data associated with the login event. Data structuringsystem may display the obtained location data as locations 760 over map750.

It is to be understood that the example data reports illustrated in FIG.5-7B are exemplary only and that other data reports are contemplated.Another example data report may include a picture matching report. For apicture matching report, a data structuring system may determine aidentifier associated with a picture selected by a user and may use theidentifier to identify all the social media platform profiles associatedwith the picture (e.g., that include the photo in a photo album, wallpost, private message, etc.). Identifiers may include, for example, EXIFdata, MD5hash values, or other identifiers known in the art. The datastructuring system may display the identified profiles as a graph,histogram, or any other format of data report.

Another data report may include a shared IP address data report. Theshared IP address data report may include all the social media platformprofiles associated with login events having the same IP addressproperty. For example, a user may select an IP address associated with asubject of a criminal investigation logging into a social mediaplatform. The data structuring system may determine all the social mediaplatform profile logins using the same IP address, and display theidentified profiles as a graph, histogram, or any other format of datareport.

FIG. 8 depicts a flowchart of an example method 800, consistent withsome embodiments and aspects of the present disclosure. Method 800 maybe implemented, for example, for structuring data from unstructuredelectronic data files. The number and sequence of operations in FIG. 8are provided for purposes of illustration and may be modified, enhance,substituted, or otherwise changed, in view of the present disclosure. Insome embodiments, method 800 may be implemented as one or more computerprograms executed by one or more processors. Moreover, in someembodiments, aspects of method 800 may be implemented by a datastructuring system (e.g., data structuring system 130 having one or moreprocessors executing one or more computer programs stored on anon-transitory computer readable medium) or a social media platform(e.g., social media platform 110, 120 having one or more processorsexecuting one or more computer programs stored on a non-transitorycomputer readable medium). In some embodiments, method 800 may beimplemented by a combination of a data importation system and a clientdevice.

In some embodiments, example method 800 may include receiving anelectronic data file at 810. For example, the data structuring systemmay receive legal process returns in the form of electronic data filesfrom one or more social media platforms via a communications interface(e.g., communications interface 210 of FIG. 2). The legal processreturns may be provided, for example, in response to a legal processsuch as a search warrant, national security letter, subpoena, etc.,associated with a criminal investigation of a subject conducted by a lawenforcement agency. The electronic data files may include any electronicfile format and various types of structured and/or unstructured content.Example electronic data file formats include word processing files(.doc, .docx, .txt, .log, .rtf, etc.), spreadsheets (.xls, .xlsx, .ods,etc.), comma separated values (CSV) files, presentations, archived andcompressed files (e.g., ZIP files, 7z files, cab files, RAR files,etc.), database files, PDF files, PUB files, image files, XML files,specialized tax and financial files (e.g., Open Financial Exchange andInteractive Financial Exchange files), tabulated data files and webpagefiles (e.g., HTML files). The content may include, for example, socialmedia data associated with the subject of the criminal investigation asdescribed above in reference to FIG. 1.

In some embodiments, example method 800 may include parsing theelectronic data file to identify one or more objects included in theelectronic data file at 820. For example, when the content included inthe electronic data file received at 810 is unstructured content, thedata structuring system may parse the unstructured data so that the datacan be converted to a structured format. In some embodiments, the datastructuring system includes a parser (e.g., parser 232 of FIG. 2) thatparses the unstructured content using the parsing techniques describedabove in reference to FIG. 2. For example, the parser may identify wordsor strings of words in the received electronic data file and compare theidentified words or strings of words to a selected set of object typesdefined in a database ontology (e.g., database ontology 240 of FIG. 2)to identify objects included in the electronic data file.

In some embodiments, example method 800 may include processing theunstructured content to identify one or more properties associated withthe identified objects at 830. For example, the data structuring systemmay include a mapper (e.g., mapper 234 of FIG. 2) that assignsproperties to the objects identified at 820. The objects, assignedobject types, and assigned property types may be assigned to astructured object model (e.g., object model 300 of FIG. 3) of theelectronic data file corresponding to the legal process return. In someembodiments, the object models may be stored in an object model database(e.g., object model database 250 of FIG. 2).

In some embodiments, example method 800 may include generating a datareport at 840. For example, the data report may be generated by anobject explorer of the data structuring system (e.g., object explorer260 of FIG. 2). In some embodiments, the generated data report may be aninteractive GUI (e.g., interactive GUI 400 of FIG. 4) that allows forfree-form exploration and customization of the identified objects andproperties. In some other embodiments, the generated data report mayinclude any of the example data reports illustrated in FIGS. 5-7B anddescribed above.

Embodiments of the present disclosure have been described herein withreference to numerous specific details that can vary from implementationto implementation. Certain adaptations and modifications of thedescribed embodiments can be made. Other embodiments can be apparent tothose skilled in the art from consideration of the specification andpractice of the embodiments disclosed herein. It is intended that thespecification and examples be considered as exemplary only, with a truescope and spirit of the present disclosure being indicated by thefollowing claims. It is also intended that the sequence of steps shownin figures are only for illustrative purposes and are not intended to belimited to any particular sequence of steps. As such, it is appreciatedthat these steps can be performed in a different order whileimplementing the exemplary methods or processes disclosed herein.

What is claimed is:
 1. A system comprising: one or more computerprocessors; and one or more computer-readable mediums storinginstructions that, when executed by the one or more computer processors,cause the system to perform operations comprising: determining, based onscanning a header of an electronic data file, that the electronic datafile includes data received from a first social media platform, theelectronic data file including content; identifying, from a plurality ofdatabase ontologies corresponding to a plurality of social mediaplatforms, a database ontology corresponding to the social mediaplatform associated with the electronic data file, the database ontologydefining known data objects and corresponding property types for contentreceived from the social media platform; and parsing, based on thedatabase ontology corresponding to the social media platform, theelectronic data file into known data objects identified by the databaseontology.
 2. The system of claim 1, the operations further comprising:generating a data report based on the parsing of the electronic datafile.
 3. The system of claim 1, wherein parsing the electronic data fileinto known data objects comprises: identifying a first string in thecontent; comparing the first string to the known data objects defined inthe database ontology corresponding to the social media platform,yielding a comparison; determining, based on the comparison, that thefirst string matches a first known data object defined in the databaseontology corresponding to the social media platform, the first dataobject being of a first data object type that is associated with a firstproperty type; and in response to determining that the first stringmatches the first known data object, identifying the first string as afirst identified object in the content and assigning the first objecttype to the first string.
 4. The system of claim 3, wherein parsing theelectronic data file into known data objects further comprises:identifying a second string that follows the first string in thecontent; and identifying the second string as a first identifiedproperty of the first identified object in the content and assigning thefirst property type to the second string.
 5. The system of claim 1,wherein the electronic data file associated with the legal processreturn is received in response to a legal process.
 6. The system ofclaim 5, wherein the legal process includes at least one of a warrant, anational security letter, and a subpoena.
 7. The system of claim 4,wherein the first identified data object is a private message, and thefirst identified property is an identifier included in the privatemessage.
 8. The system of claim 2, wherein the data report includes atleast one of a list of histogramed telephone number data report, aconversation reconstructed from one or more private messages, a logininformation data report, a picture mapping data report, and a shared IPaddress data report.
 9. A method comprising: determining, based onscanning a header of an electronic data file, that the electronic datafile includes data received from a first social media platform, theelectronic data file including content; identifying, from a plurality ofdatabase ontologies corresponding to a plurality of social mediaplatforms, a database ontology corresponding to the social mediaplatform associated with the electronic data file, the database ontologydefining known data objects and corresponding property types for contentreceived from the social media platform; and parsing, based on thedatabase ontology corresponding to the social media platform, theelectronic data file into known data objects identified by the databaseontology.
 10. The method of claim 9, further comprising: generating adata report based on the parsing of the electronic data file.
 11. Themethod of claim 9, wherein parsing the electronic data file into knowndata objects comprises: identifying a first string in the content;comparing the first string to the known data objects defined in thedatabase ontology corresponding to the social media platform, yielding acomparison; determining, based on the comparison, that the first stringmatches a first known data object defined in the database ontologycorresponding to the social media platform, the first data object beingof a first data object type that is associated with a first propertytype; and in response to determining that the first string matches thefirst known data object, identifying the first string as a firstidentified object in the content and assigning the first object type tothe first string.
 12. The method of claim 11, wherein parsing theelectronic data file into known data objects further comprises:identifying a second string that follows the first string in thecontent; and identifying the second string as a first identifiedproperty of the first identified object in the content and assigning thefirst property type to the second string.
 13. The method of claim 9,wherein the electronic data file associated with the legal processreturn is received in response to a legal process.
 14. The method ofclaim 13, wherein the legal process includes at least one of a warrant,a national security letter, and a subpoena.
 15. The method of claim 12,wherein the first identified data object is a private message, and thefirst identified property is an identifier included in the privatemessage.
 16. The method of claim 10, wherein the data report includes atleast one of a list of histogramed telephone number data report, aconversation reconstructed from one or more private messages, a logininformation data report, a picture mapping data report, and a shared IPaddress data report.
 17. A non-transitory computer-readable mediumstoring instructions that, when executed by one or more computerprocessors of a computing system, cause the computing system to performoperations comprising: determining, based on scanning a header of anelectronic data file, that the electronic data file includes datareceived from a first social media platform, the electronic data fileincluding content; identifying, from a plurality of database ontologiescorresponding to a plurality of social media platforms, a databaseontology corresponding to the social media platform associated with theelectronic data file, the database ontology defining known data objectsand corresponding property types for content received from the socialmedia platform; and parsing, based on the database ontologycorresponding to the social media platform, the electronic data fileinto known data objects identified by the database ontology.
 18. Thenon-transitory computer-readable medium of claim 17, the operationsfurther comprising: generating a data report based on the parsing of theelectronic data file.
 19. The non-transitory computer-readable medium ofclaim 17, wherein parsing the electronic data file into known dataobjects comprises: identifying a first string in the content; comparingthe first string to the known data objects defined in the databaseontology corresponding to the social media platform, yielding acomparison; determining, based on the comparison, that the first stringmatches a first known data object defined in the database ontologycorresponding to the social media platform, the first data object beingof a first data object type that is associated with a first propertytype; and in response to determining that the first string matches thefirst known data object, identifying the first string as a firstidentified object in the content and assigning the first object type tothe first string.
 20. The non-transitory computer-readable medium ofclaim 19, wherein parsing the electronic data file into known dataobjects further comprises: identifying a second string that follows thefirst string in the content; and identifying the second string as afirst identified property of the first identified object in the contentand assigning the first property type to the second string.